Relative synonymous codon usage and codon pair analysis of depression associated genes

Depression negatively impacts mood, behavior, and mental and physical health. It is the third leading cause of suicides worldwide and leads to decreased quality of life. We examined 18 genes available at the genetic testing registry (GTR) from the National Center for Biotechnological Information to investigate molecular patterns present in depression-associated genes. Different genotypes and differential expression of the genes are responsible for ensuing depression. The present study, investigated codon pattern analysis, which might play imperative roles in modulating gene expression of depression-associated genes. Of the 18 genes, seven and two genes tended to up- and down-regulate, respectively, and, for the remaining genes, different genotypes, an outcome of SNPs were responsible alone or in combination with differential expression for different conditions associated with depression. Codon context analysis revealed the abundance of identical GTG-GTG and CTG-CTG pairs, and the rarity of methionine-initiated codon pairs. Information based on codon usage, preferred codons, rare, and codon context might be used in constructing a deliverable synthetic construct to correct the gene expression level of the human body, which is altered in the depressive state. Other molecular signatures also revealed the role of evolutionary forces in shaping codon usage.


Compositional analysis
Depression-related testing for genes was searched from the Genetic Testing Registry (GTR), National Center for Biotechnology Information Search database.The tests gtr/tests/508,961 by Assurex Health Inc, gtr/tests/569,407 by genomind Professional PGx Express CORE Anxiety & Depression, and gtr/tests/579,485 by Intergen Genetic Diagnosis and Research Centre presented a panel of 18 genes that are evaluated for the presence of depressive disorders.Different gene genotypes are available based on the SNPs; however, we accessed only the 'reference' coding gene sequences from the NCBI nucleotide database.Although a larger number of genes is preferable to support statistical analyses, this was the available total number of genes in the accessible panel targeted to a depression diagnosis and, hence, 18 gene sequences were obtained (for specifics, see Table 1).
Our compositional analysis of genes involved in depression revealed that GC3 content, which is an indicator of codon bias 52 , was highest amongst all other compositional parameters.Average %A, %C, %T and %G composition was 24.39%, 26.17%, 23.66% and 25.75%, respectively.In occurrence, these nucleotides appear in the order of %C > %G > %A > %T.At codon position one nucleotide composition %T1 (18.67%), at codon position two %G2 (17.82%) and at the third codon position %A3 (16.42%) were least, and %GC3 content varied between 41.80% and 83.82%.

GC content (GC12 and GC3) effects on gene length
The coding-sequence lengths possess an evolutionary meaning in relation to GC content compositional variations in DNA.An analysis of the genome database revealed a richness of GC in the longest coding sequences in vertebrates and prokaryotes, with the additional observation that the shorter versions of these are GC poor 53 .A Pearson correlation coefficient (r) was obtained based on the linear correlation between the two data sets.This analysis revealed a lack of correlation between length and GC components %GC12 and %GC3, which indicated no dependency of %GC content on lengths of genes.A trend was observed that among all 18 evaluated genes, most of the genes had a size between 1350 and 1650 bp.Furthermore, in all the genes, %GC3 content was higher than %GC12.Gene lengths were normalized by dividing them by 100 to be comparable with the percent GC composition.A depiction of normalized gene length and %GC3 content is given in Fig. 1.To evaluate correlation trends between length and %GC content, we additionally appraised the correlation between the adjusted length and %GC content of a set of 62 housekeeping genes.We found that length negatively correlates with %GC3 (Pearson correlation coefficient r = -0.263,p < 0.05) in housekeeping genes (Supplementary Table S1).www.nature.com/scientificreports/

Dinucleotide ratio analysis
Dinucleotides CpG, GpT, and TpA were either underrepresented or randomly presented (odds ratio < 1.6) in all the genes envisaged.On the other hand, ApG, CpT, GpA, and TpG dinucleotides were either overrepresented or randomly presented (odds ratio > 1.6).

RSCU analysis shows preference of GC ending codons
The overall RSCU analysis revealed that GC ending codons were preferred over AT ending codons.CTG and GTG codons were the most overrepresented codons, whereas TTA, GTA, ATA, CTA, CGT, ACG, GCG, CCG, and TCG codons were the most underrepresented codons (Fig. 2).RSCU values of depression associated genes are shown in Table 2. To determine the correlation trends between length and %GC content, we further sought a correlation between adjusted length and %GC content of a set of 62 housekeeping genes.Also, we compared RSCU values of depression-associated genes with the RSCU values of housekeeping genes, and, based on t-test, Table 1.Depression associated genes evaluated for codon pattern analysis: their regular functions and roles during depression along with their modulated expression and SNP data.www.nature.com/scientificreports/ it was evident that codon usage was significantly different (t = 3.58, p < 0.0001) for codon GTA.In addition to this, codons GTG, CCC, GAT, and GAC also differed at a 10% significance level (Table 3).

Relationship between codon bias, nucleotide skews and gene length
CUB had a significant positive association (r = 0.863, p < 0.001) with the length of proteins.We also investigated the relationship between protein length and protein expression level, but a lack of correlation was observed.Nucleotide disproportion is referred to as skews.Various skews, including AT skew, GC skew, purine skew, pyrimidine skew, keto skew, and amino skew are available to assess the effects of nucleotide disproportion on any parameter under consideration.Herein, we compared the effects of various skews on CUB, and found that only the pyrimidine, amino and keto skews had significant positive correlation with scaled Chi square value (SCS) values (r = 0.767, p < 0.05, r = 0.756, p < 0.01, r = 0.793, p < 0.01; Spearman correlation "r" with Bonferroni correction).Different nucleotide skew values are given in Table 4.

CUB and gene expression profiling
Codon adaption index (CAI) is used as a quantitative method of predicting the level of expression of a gene based on its codon sequence 54 .In the study of Sahoo et al. 55 , critical analysis of predicted highly expressed (PHE) genes in Arabidopsis thaliana was performed by considering the expression data from Gene Expression Omnibus (GEO) datasets, where protein expression levels are quantified by RMA (Relative Molecular Abundance) signal intensity.The linear Pearson correlation coefficient between RMA and CAI showed a statistically significant correlation (r = 0.47, p < 0.05).In another experiment conducted by Guimaraes et al. 56 , protein abundance (PA) was measured for > 800 genes in.CAI was found to be significantly correlated with PA after controlling for mRNA abundance (r = 0.3526, P ≤ 0.001).The above examples clearly indicate that CAI might be conveniently used as a surrogate for protein expression.Thus, we used CAI values as expression data for depression genes (calculated through server CAIcal, developed by Puigbo and colleagues (2008) to correlate with their respective gene lengths 57 ).The CAI values of the genes associated with depression displayed values ranging from 0.713 (UGT2B15) to 0.85 (CYP1A2).The CAI value has a significant negative association with the SCS value (r = − 0.910, p < 0.001), and this indicates that in highly expressed genes, low codon bias is present 58 .A higher CAI indicated a relatively high protein expression level.Most of the AT ending codons have a significantly negative relationship with CAI, except for GTA, CGT, GCT (bearing no relationship with CAI).In contrast, most GC ending codons had a significant positive relationship with CAI, except for GTC, CTC, ACG, and TCG (with no relationship with CAI).The only exception was codon TTG that had a significant negative relationship with CAI.

Codon context analysis revealed a context between stop codon UGA and other amino acid encoding codons
On the one hand, where codon bias is a preferred use of codons, on the other hand, codon context refers to the presence of sequential pairs of codons in a gene 59 .In this light, codon context analysis was undertaken on the 18 genes associated with depression.Codon context, additionally, is a feature that influences the gene expression independent of codon bias 60 .The trend for codon context variation is depicted as a matrix of 64*64 codons.The total number of codon pairs observed in the 18 genes is 2047.As illustrated in Fig. 3, highly used codon pairs are displayed as a green colour, whereas lesser-used codon pairs are presented as red.The rows display 5' codons,  www.nature.com/scientificreports/whereas the columns display 3' codons (Fig. 3).It is clear from the Figure that stop codon UAG exhibited high context with many of the amino acid encoding codons.With that, all kinds of contexts (positive, negative and no context) were observed between the codons of envisaged genes.

Arginine or proline initiated codon pairs are abundant
Out of 15 top overrepresented codon pairs, only two codons comprised either CpG or TpA as their part.Out of 540 rare codon pairs (absent codon pairs are excluded), a maximum of 75 codon pairs were arginine initiated, followed by 65 codon pairs for proline.Methionine-initiated codon pairs were rarest (09 only).Among the most preferred 15 codon pairs, a maximum of 04 were leucine initiated (Table 5).These results indicate a distinct pattern for codon pair preference or avoidance due to multiple evolutionary forces acting on depression-associated genes.

Nucleotide disproportion influence on protein indices
We envisaged six nucleotide skews, namely AT skew, GC skew, purine skew, pyrimidine skew, keto skew, and amino skew.We performed Pearson linear correlation analysis between the nucleotide skews and protein properties to determine whether nucleotide disproportion influences physical protein properties (Table 6).Amino skew did not correlate with any of the protein properties envisaged.The results are suggestive of the effect of nucleotide disproportion on protein properties.

Translation selection P2 is suggestive of a role of selectional forces
Translation selection (P2) values indicate the binding strength between the codon and anticodon.This was determined using the values of WWC, SSC, WWU, and SSU using the average RSCU values, and a value of 1.01 indicates strong selectional forces behind it.

Neutrality analysis confirms major role of selectional forces
Regression analysis between the %GC3 and %GC12 provided a slope value of 0.3276, which indicated that relative neutrality was 32.76% and the relative constraint was 67.24% (Fig. 4A).This signifies that selectional force (67.24%) was dominant over mutational force (32.76%).The graph also indicates that %GC3 is responsible for

Parity analysis revealed preference of T and C over A and G nucleotides
Parity analysis determines the bias between A/T and C/G at the third codon position.At the center, where the axis value is zero, A = T and C = G.In the present study, the average position of x = 0.469 ± 0.050 (AT bias) and y = 0.439 ± 0.054 (GC bias).A bias value of less than 0.5 indicates a preference for pyrimidine over purines 61 .Herein, our analysis indicated that thymidine is preferred over adenine, and that cytosine is preferred over guanosine (Fig. 4B).

Relationship of codon bias with %GC3 content and gene expression
An ENc (effective number of codons) versus GC3 plot is generally used to study the effect of %GC3 composition, which is suggestive of both a mutational force and compositional parameter on codon bias.In the event that codon choice is constrained by mutational force alone, all the data points will lie on or just below the GC3 curve, whereas in the case of an operating selection force, the data points are well below the GC3 curve 62 .In the present study, only a few points were present near the curve.The rest of the data points are present below, suggesting selection force as a dominant force in shaping codon usage in depression-associated genes (Fig. 4C).Furthermore, we investigated the effect of codon bias on gene expression by regressing them.Since ENc is the non-directional measure of codon bias, a negative correlation between them (Pearson correlation r = − 0.904, p < 0.0001) indicates that gene expression also increases with increasing codon bias.Overall, 81.81% variation in gene expression is attributed to codon bias (Fig. 4D).

Effects of mutation pressure on codon composition is highest for G and least for T nucleotide
Mutation at the third position of a codon did not change the meaning of the codon, with regard to the amino acid encoded by it, and is called the silent position of the codon because of redundancy of the code.Nevertheless, this position is affected by mutation force since, here, mutation changes the nucleotide but not the meaning of the codon.The effect of mutational force on composition was 92.55%, 84.28%, 88.9%, and 93.25% for nucleotides A, T, C, and G, respectively (Fig. 5).In this regard, it is clear from Fig. 5 that mutational forces on G nucleotide contributed the most in relation to determining its composition (93.25%), whereas mutational forces on nucleotide T contributed least towards determining its composition (84.28%).

Principal component analysis
Principal component analysis was undertaken using the 59 RSCU values of 59 codons.Figure 6 represents the correspondence analysis and reveals that the first two axes account for significant variation (50.46% and 10.88%, respectively).The third and fourth axes account for 6.64% and 5.78% variation, respectively, and the contribution of the first four axes is 73.76%.Based on the loading values, codons AGA, CTG, CGC, and GGA influence CUB the most in depression-associated genes.The first and second principal component (PC1 and PC2) scores of different genes are provided in Fig. 6.

Discussion
Depression is a disorder with a wide range of symptoms.In evaluating patients with depression, GWAS has revealed a high degree of polygenicity that underlies the mental illness and related complex phenotypes, and has discovered that many SNPs with relatively small effect size, when combined, potentially contribute to phenotype development 4 .Polygenicity includes some genetic heterogeneity; affected people may have different combinations of risk alleles, and unaffected people will also carry many of these variants.Depression is clearly a heterogeneous condition, as evidenced by the fact that two people can be diagnosed with depression but have no common symptoms.Added to this, neurodegenerative disorders too 63 can potentially contribute to depression 64 .In this light, various studies have been undertaken to understand the physiology and genetics behind depression.To our knowledge, however, no previous work has described the compositional features and codon usage patterns of genes associated with depression.Hence, the present research focuses on the codon usage of genes associated with depression.Our evaluation used a panel of 18 genes that have been associated with depression (Table 1).Although this number is not optimal and can be considered by some to be undersized for statistical analyses, it is the maximum number of genes available for depression detection from the NCBI gene testing registry.The products of genes are involved in multiple biological functions and pathways (given in Table 1), and altered expression levels or SNPs can lead to various genotypes that result in diseased conditions or different response to medications.
Nucleotide composition is imperative in knowing the codon usage since many of the parameters associated with codon usage indices, including nucleotide skew, neutrality, and parity plots, are composition dependent.Compositional analysis revealed that %C occurrence was highest, with the lowest occurrence of %T.The %GC3 content was the most variable compositional parameter and varied between 41.80 and 83.82%.
CAI is a measure of gene expression level, and this measure compares the codon composition of a gene with a reference set of genes 65 .Our study found a range of CAI values between 0.713 (UGT2B15) and 0.85 (CYP1A2).In Escherichia coli (E.coli), which has long been regarded as a model organism in the study of  CUB, the highest CAI value of 0.85 was for the lpp gene, one of the most abundant genes, encoding an outer membrane lipoprotein 66 .Hence, it can be speculated that the CAI value 0.85 (CYP1A2 gene), in our depression study, likely also is associated with a high-level expression.The relationship between the CAI and expression value can be better understood in the light of an experiment conducted by Dos Reis et al. 58 , who distributed the E. coli genes into three groups based on codon usage and expression level data obtained from microarray experiments.They found a positive relationship between the CAI value and expression level in one of these group.In another group, the genes with low CAI were highly expressed, which contradicts the set paradigm of CUB, where optimal codon usage leads to higher CAI.However, the results are still explainable based on the mutation-selection balance hypothesis of codon usage.High CAI values were also obtained in the present study, indicating a higher expression level.However, other dynamic factors, including mutational-selectional balance, could provide attributing factors to the expression.CAI is associated with compositional constraints and can potentially show all relationships (negative, positive, and no correlation).Hence it can be inferred from this study that the gene expression level depends on the base composition.Such a phenomenon could be the compositional pressure on CUB, which ultimately drives the gene expression.Our view is supported by the results of Sahoo et al., 67 who described the critical role of codon composition in regulating the gene expression profile in the Arabidopsis thaliana genome (a small plant from the mustard family native to Eurasia and Africa) based on the score of modified relative codon bias.A study by Franzo and colleagues 68 , likewise, demonstrated that CUB is highly affected by nucleotide composition in an evaluation of an infectious bronchitis virus.The genes associated with depression showed an interesting pattern related to nucleotide composition and CUB.After comparing compositional constraint relationships with SCS, one of the measures of CUB, we found a negative relationship of SCS with G nucleotides (overall %G, %G2, %G3 and %GC2) only.This signifies the importance of G nucleotide in determining codon usage.Codon usage bias is affected by several factors, and gene length is one of them.Based on a study on codon usage in 8,133, 1,550, and 2,917 genes, respectively, from Caenorhabditis elegans, Drosophila melanogaster and Arabidopsis thaliana, a significant negative linkage between codon usage and protein length was explained 69 .On the other hand, Eyre-Walker 70 found a positive association between codon usage and gene length, suggesting selection against missense errors in E. coli.In this light, it can be inferred that length can have both positive and negative impacts on CUB-depending on the model organism under evaluation.CUB and protein length were positively correlated with GC3 content and the correlation was stronger for %GC12 content in all the proteins envisaged, without any exception.Our results agree with Khandia et al. 71 , who found that in all the proteins whose size ranged between 150 and 3000 amino acids in a study focused on primary immunodeficiency and cancer, GC12 content was lower than GC3-without any exception.
In our current study, dinucleotides CpG, GpT, and TpA were underrepresented, whereas ApG, CpT, GpA, and TpG were overrepresented.In the human ORFome (open reading frames within a genome), CpG and TpA dinucleotides show the highest level of suppression, and GpT is the third of those with the lowest abundance 72 .Thus, it appears that depression-related gene sets also follow the common trend of odds ratio present in human ORFome.CpG dinucleotides occur at a low frequency in the human genome, and this is attributed to a higher mutation rate of 5-methylated CpG to TpG, and, as a result, the TpG dinucleotide is increased 73 .Contrary to the results of Kunec and Osterrieder 72 and to ours, Franzo et al. 68 found an overrepresentation of GpT dinucleotide.ApG, CpT, GpA, and TpG overrepresentation partially concord with Franzo et al. 68 , who reported ApG and TpG dinucleotide pairs overrepresented in the whole-genome, and CpT in the polyprotein region only in infectious bronchitis virus.Such results suggest that the odds ratio might serve as a molecular signature 74 .
RSCU analysis indicated that GC ending codons were preferred over AT ending codons; however, parity analysis indicated that T and C nucleotides are preferred over A and G nucleotides.In accordance with the results of nucleotide analysis, codons encompassing TpA and CpG dinucleotides (TTA, GTA, ATA, CTA, CGT, ACG, GCG, CCG, TCG) were underrepresented.The overrepresentation of CTG and GTG codons observed in the present study matches the results of Khandia et al., 71 , who found overrepresentation of CTG and GTG in 78.33% and 68.33% of genes common to primary immunodeficiency and cancer, respectively.This abundance of CTG and GTG codons might have come from the conversion of CpG to TpG dinucleotide, an integral part of the CTG and GTG codons.Such result suggest that RSCU bias is the result of dinucleotide bias 72 , resulting from a consequence of intrinsic characteristics and evolutionary forces like selection and mutation 75 .
The codons also influence the gene expression level, and it was observed that most AT-ending codons have a negative association with CAI.In contrast, most GC ending codons have a positive association with GC ending codons.The only exception to this was the codon TTG, which is negatively associated with CAI.The two codons, AGG and TTG, behave differently in the human genome.When the other C and G ending codons are decreased, these two increase 76 , which is probably why they are inversely affected by CAI.
Compositional properties affect codon usage and nucleotide disproportion too.Nucleotide disproportion (skews) also affects CUB and, in the Nipah virus, an association between CUB and nucleotide skew similarly has been reported 77 .We found CUB becomes affected by purine skew.Various skews significantly affected different protein indices, also suggestive of the role of compositional constraints on the physical properties of proteins.In mitochondrial NADH dehydrogenase genes (ND genes, encoding for respiratory complexes) of Amphibia, amino skew, purine skew, and keto skew showed a significant correlation with ENc, thereby demonstrating that skewness can potentially affect the CUB 78 .In the genes associated with depression, %GC12 and %GC3 are found to be significantly positively correlated (r = 0.846, p < 0.001), and this correlation is suggestive of the role of mutational force in shaping codon usage 79 .
The CUB and codon context bias are important parameters to be considered during heterologous protein expression 80 .In our study, it was evident that few of the codons remain minimally used, and this is in accord with the studies of Chakraborty et al., 81 on codon context in leukemia-associated genes.Identical codon pairs, GTG-GTG and CTG-CTG codon pairs were the most favored codon pairs in the depression-associated gene  82 .
In the present study, we performed gene correlation analysis to determine whether the genes involved in similar functions share similar attributes or not.Gene correlation analysis was undertaken based on RSCU to determine whether genes have a similar kind of codon usage or not.The data indicated that all the 18 genes evaluated displayed similar codon choices, as evidenced by the positive relationship among all the genes in the study.However, the correlation value varied at different levels, and few genes did not display correlation.When the gene correlation was studied at the protein indices level, all genes were found positively correlated except for the CYP3A4 gene, which showed no correlation with any of the genes.Such analysis helps determine how genes involved in one kind of ailment may be similar based on different parameters, and we found similarities between them based on RSCU and protein indices.
Translational selection (P2) refers to the strength of the binding force between the codon and anticodon, and indicates selectional pressure.In the four cotton species (G.arboreum, G. raimondii, G. hirsutum and G. barbadense), P2 values were more than 0.5.In this light, our result indicates the dominant role of selection over mutation pressure in the codons' usage 83 .
Upon evaluating the effects of mutational forces on overall nucleotide composition, it was evident that mutational pressure affected nucleotide A and G equally (approx.57%), whereas nucleotide C was least affected.Principal component analysis indicated that the codon usage by genes is majorly influenced by G and C ending codons.Overall analysis revealed the importance of compositional, mutational, and selectional pressure.However, the role of selection pressure was dominant over the others 84 .There are a few striking similarities in neurobiological alterations between depressive disorders and neurodegeneration, as in Alzheimer's, Parkinson's, and Huntington's disease 64 .In the study of Khandia et al., 63 , codon pattern studies in neurodegeneration-related gene sets have been undertaken with minor overlap in which gene composition, dinucleotide analysis, RSCU, CAI, and different protein indices were evaluated.In the future, parameters like codon pair occurrence, codon context, and effects on gene expression on codon bias might be investigated in such genes.
The present study envisages an investigation of different molecular patterns and relative synonymous codon usage in 18 depression-associated genes; here, out of 18 genes, 09 genes showed modulation of gene expression during the depressive state.BDNF, COMT, CYP2C9, CYP3A4, HTR2A, SLC6A4, and MTHFR genes showed reduced expression, while UGT1A1 and CYP2C19 showed enhanced expression.For other genes, different genotypes (related to SNPs) associated with depression or response to depression therapy could not be included in the study since the SNPs responsible for depression might be present in promoter/repeats/exon/ intron/leader sequences 85 , but the analysis of codon usage, codon pair, CAI, and other patterns is intended for only protein-encoding sequences.As a consequence, we acquired only the coding sequences of the envisaged genes, which were available as RefSeqGene in the NCBI database.In relation to the 07 genes whose expression is found downregulated during depression, this theoretically might be corrected for their expression level by introducing a copy of the gene (such as by using gene therapy methods employed currently, like CRISPR-cas) with codon usage in such a manner so that codons with lower RSCU values might be changed with codons having higher RSCU values, to enhance the gene expression which might be presumed using the index CAI; thereby using the current study to open potential new hypotheses and avenues for future research.

Conclusion
In relation to CUB evaluation of depression associated genes, compositional analysis revealed that %C nucleotide was highest, followed by %G, %A, and %T.Among all compositional constraints, %GC3 was variable the most.All the 18 genes envisaged in the study had high CAI values, indicating high-level gene expression.Additionally, within the present study, the gene expression level was driven by compositional constraints.Interestingly, CUB in depression-linked genes is associated solely with overall G nucleotide composition and composition at the second and third codon position, referring to the effect of G nucleotide compositional constraint on CUB.Codon bias was positively correlated with the length of the gene, indicating increased bias with the length of the protein.CpG, GpT, and TpA dinucleotides were underrepresented with an over-representation of ApG, CpT, GpA, and TpG dinucleotides.The pattern present in dinucleotides was seen further in RSCU values of codons, where all CpG and TpA containing codons have low RSCU values and are underrepresented.Likewise, overrepresented dinucleotide CpG is further exhibited in CTG and GTG over presented codons.Among the nucleotide skews evaluated, purine skew was found to affect CUB.A highly significant positive relationship between GC3 and GC12 indicated the role of mutational force in shaping codon usage.The neutrality plot exhibited the prominent role of the selection force in shaping codon utilization.The parity plot results further supported this notion in which T and C nucleotides are preferred over A and G nucleotides.Based on translation selection (P2) analysis, it could be inferred that the genes had low codon bias.Gene correlation analysis based on RSCU revealed a variable degree of positive correlation among genes showing a similar codon usage pattern, which the PCA further established.All the genes clustered together indicated a similar codon choice.Codon context analysis revealed the abundance of identical codon pairs GTG-GTG and CTG-CTG, which enhance the translational rates and are results of selection forces.Based on the study, a synthetic construct could potentially be synthesized with the information on relative synonymous codons, codon bias, codon pair bias, and CAI in hand.Such a construct might help modulate gene expression.For example, in 07 genes studied here, which are downregulated during depression, restoring an overexpressing copy within the body through gene therapy might potentially curb the ailment, and provides an hypothesis and potential avenue for future research.www.nature.com/scientificreports/

Translational selection
The P2 analysis indicates the strength of codon-anticodon interaction and indicates translation efficacy when information of a preferred codon set is unknown 83 .
Translational selection P2 was calculated using the formula: where W = A or U, S = C or G, and Y = C or U.Moreover, any values above 0.5 indicate a bias favoring translational selection 93 .

Codon context analysis
In prokaryotic genes, it was first observed that codons and codon pairs also exhibit a bias in occurrence 94 .In another study, it was observed that codon pairs also influence the rate of translation.Overrepresented codon pairs are translated at a slower speed than pairs of underrepresented codons.The phenomenon is related to the compatibilities of adjacent tRNAisoacceptor molecules present on ribosomes participating in translation.Such results suggest co-evolution of frequency of one codon to the next codon with structural compatibilities and tRNAisoacceptor abundance as a measure to control translation rates 95 .Furthermore, codon pair optimization and deoptimization have been proven to affect the translation efficiency in several experiments deciphering the importance of codon context bias 96,97 .We performed codon context analysis using Anaconda 2 software in the present study.

Statistical analysis
Statistical analyses, such as Pearson correlation and regression analysis, were undertaken using PAST4 software.
Standard calculations, such as additions and subtractions, were performed in Microsoft Office 2010 used in skew and other analyses.Principal component analysis was undertaken using PAST4 software.

Figure 2 .
Figure 2. RSCU values of different codons in 18 depression associated gene sets shows an underrepresentation of A/T ending codons.

Figure 3 .
Figure 3. Codon context analysis for depression-associated genes.The green color portrays highly used codon pairs, whereas red represents lesser-used codon pairs.A pink color depicts a null usage of codons.Codon UGA and UAG were found paired with some specific codons.Statistically insignificant values are depicted as black.

Figure 4 .Figure 5 .
Figure 4. (A) Regression analysis between average %GC content at codon position one and two (%GC12) and %GC (%GC3) content at the third codon position.(B) Parity plot comprising GC bias (G3/G3 + C3) on abscissa and AT bias (A3/A3 + T3) at the ordinate.(C) ENc-GC3 analysis showing presence of data points below the expected Nc curve depicting prevalence of selection force.(D) Regression between CAI and ENc (effective number of codons) revealed that 81.81% variations in CAI are attributed to ENc and thus on codon bias.

Table 2 .
RSCU values of individual genes.

Table 3 .
The t-test analysis between RSCU values of depression and housekeeping genes with 1000 bootstrap value, wherein iteratively resampling a dataset with replacement is involved.NS non-significant.***p < 0.0001.

Table 4 .
Nucleotide skew in relation to the 18 depression associated genes.

Table 5 .
Codon context analysis for top 15 overrepresented and rare codon pairs.

Table 6 .
Evaluation between nucleotide skew and protein properties.Lower triangle of matrix shows Pearson's correlation coefficient, while the upper triangle shows the level of statistical significance.